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1.  Introduction. 


In  a  recent  series  o(  papers  ((1981).  (1984a),  and  (1984b))  Bradley  Efron  has  suggested  a  number  of 
methods  for  constructing  confidence  intervals  for  a  real  valued  parameter  8  using  the  bootstrap.  In 
increasing  order  of  generality,  these  are  the  Percentile  interval,  the  Bias  Corrected  Percentile  (BC)  interval 
and  the  Bias  Corrected  Percentile  Acceleration  (BCJ  interval.  Each  of  these  intervals  is  constructed  from 
the  bootstrap  distribution  of  a  statistic  9  . 

A 

The  usual  (norvparametric)  bootstrap  works  by  sampling  from  the  empirical  distribution  function  Fn; 
accordingly,  confidence  intervals  derived  from  the  bootstrap  are  designed  for  norvparametric  problems.  It 
is  difficult,  however,  to  define  a  'correct*  confidence  interval  in  the  norvparametric  setting  and  this 
quantity  is  needed  In  order  to  measure  the  performance  of  a  confidence  interval  procedure.  Thus  to 
assess  the  quafty  of  the  bootstrap  intervals,  Efron  moves  to  a  different  arena,  that  of  one-parameter 
families.  In  this  setting,  one  can  construct  an  interval  with  the  desired  coverage  by  inverting  the  most 
powerful  test  at  each  parameter  value.  Efron  takes  (his  exact  interval  as  the  gold  standard  and  considers 
the  parametric  versions  of  the  bootstrap  intervals,  that  is,  those  obtained  from  the  'parametric*  bootstrap 

A 

(sampling  from  the  parametric  m.l.e  instead  of  F" ).  Efron  shows  that  the  most  general  of  these  intervals, 
the  BC.  interval,  is  second  order  correct;  that  is,  its  endpoints  differ  from  the  exact  interval  by  Op(1/  n). 


This  provides  a  strong  justification  for  the  BCa  interval.  Standard  confidence  intervals  of  the  form 


(e+2MS.9+z 


(i.i) 


differ  from  the  exact  interval  by  Op(1/  n1/2).  (In  the  above,  a  is  an  estimate  of  the  standard  deviation  of  0). 
The  Op(1/  n1/2)  term  can  cause  the  exact  interval  to  be  asymmetric,  an  effect  picked  up  by  the  BCa  interval 

but  not  by  the  standard  intervals  or  by  studentized  intervals,  both  of  which  are  symmetric  by  definition. 
While  Efron  does  not  show  that  the  non-parametric  BCa  interval  is  second  order  correct,  he  hypothesizes 

that  given  a  reasonable  definition  of  this  notion,  it  will  bt. 

Underlying  the  BCa  interval  is  a  transformation  of  the  problem  to  a  Normal  Scaled  Translation  Family 
(Efron  (1982))  of  the  form  0+(1-*a0)Z  where  Z  is  a  N(0,1)  random  variable.  Although  computation  of  the 
BCa  interval  doesn't  require  specification  of  this  transformation,  Efron  shows  that  a)  if  such  a 
transformation  exists,  the  BCa  interval  equals  the  exact  interval,  and  b)  the  BCa  interval  is  second  order 
correct  in  any  one  parameter  problem,  so  that  loosely  speaking,  to  second  order,  such  a  transformation 
always  exists. 

In  this  paper  we  show  how  to  construct  this  transformation  in  general.  It  turns  out  to  be  a  variance 
stabilizing  transformation  followed  by  a  skewness  reducing  transformation.  This  construction  produces 
the  following  benefits:  1)  it  sheds  fight  on  how  the  BCa  interval  works  and  2)  produces  a  new  interval,  (we 
call  it  the  *BCa°*  interval)  equal  to  the  BCa  interval  (to  2nd  order)  which  can  be  computed  without 
bootstrap  sampling.  We  also  derive  from  (2)  a  second  order  approximation  to  the  bootstrap  distribution  of 
the  statistic  that  doesn't  require  bootstrap  sampling.  Both  the  new  interval  and  the  approximation  require 
only  n+2  evaluations  of  the  statistic.  The  transformation  generalizes  the  one  constructed  by  Efron 
(1984b.  section  10)  for  translation  families. 

The  layout  of  this  paper  is  as  follows.  In  section  2  we  concentrate  on  one  parameter  problems.  We 
review  the  BCa  interval  and  its  relation  to  the  exact  interval.  The  BCa°  interval  is  defined  and  shown  to 
equal  (to  second  order)  the  BCa  interval.  Some  numerical  examples  are  given.  In  section  3  we  discuss 
confidence  intervals  for  multiparameter  problems,  and  section  4  focusses  on  the  non-parametric  problem. 
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We  show  how  the  BCa°  interval  can  be  computed  without  bootstrap  sampling  and  give  a  number  o( 
examples.  Section  5  shows  how  the  bootstrap  disribution  of  a  statistic  can  be  approximated  using  the 
tools  developed  earlier.  Finally,  in  section  6  we  provide  proofs  of  the  results  quoted  throughout. 


2.  Confidence  Intervals  for  One  Parameter  Problems. 

2.1  The  Bootstrap  Method 

We  begin  with  a  statement  of  the  bootstrap  method.  The  notation  in  this  paper  will  follow  that  of  Efron 
(1984b)  as  closely  as  possible. 

Let  y-(x1.x2...xn)  represent  the  available  data  with  each  x,  assumed  to  be  an  independent  realization 
from  an  unknown  probability  distribution  F^.  Here  q  is  the  parameter  vector  and  the  parameter  of  interest 
is  some  functional  e-UF^).  We  have  a  point  estimate  0-t(Fn)  where  F,,  is  some  estimate  of  Fn  and  would 
like  a  confidence  interval  for  8.  The  bootstrap  method  works  by  resampling  from  F^.  There  are  three 

A 

distinct  resampling  strategies  depending  on  the  choice  of  F^: 

1 )  One  parameter  problems.  Here  we  assume  that  0  is  the  only  unknown  parameter,  so  that  each  Xj  has 
distribution  Fa.  Resampling  is  done  from  F§  where  §  is  typically  the  maximum  likelihood  estimate  of  0.  This 
is  known  as  the  “parametric  bootstrap". 

2)  Multiparameter  problems.  We  take  q  equal  to  the  maximum  likelihood  estimate  of  q  and  resample  from 
Ffj.  This  is  a  multiparameter  parametric  bootstrap. 

3)  Non-parametric  problems.  F^  can  be  any  distribution,  so  we  estimate  it  by  the  empirical  distribution 
function  ?n,  the  non-parametric  maximum  likelihood  estimator  of  F^.  Resampling  from  £n  is  equivalent  to 
sampling  with  replacement  from  the  original  data  x.  ^2.  -Xn-  This  is  the  usual  (non-parametric)  bootstrap. 


2.2  The  BCa  Interval. 

Efron's  BCa  interval  uses  bootstrap  sampling  to  construct  an  approximate  1-2  a  confidence  interval  for 

A 

0.  Depending  on  the  choice  of  F^  in  steps  a)  and  b)  of  the  following  algorithm,  the  intervals  will  apply  to 
situations  1),  2)  or  3).  The  BCa  interval  is  computed  as  follows: 


.  .  .  A 

a)  Bootstrap  data  sets  ,  y2  ....ye  are  created  by  resampling  from  F^. 

•  A  #  A  #  A  4 

b)  For  each  yb  ,  b«1,2,...B,  the  bootstrap  estimate  0*,  -tfF^  )is  calculated,  where  F^  is  the  estimate  of  F^ 
based  on  yb*. 

A  . 

c)  The  bootstrap  distribution  of  the  9*,  values  is  constructed, 

G(s)  -  (2.1) 

d)  The  bias  correction 

(22) 

is  computed.  <I>(.)  being  the  cdf  of  the  standard  normal. 

e)  The  acceleration  constant  a  is  computed  (details  later). 

0  The  BCa  interval  is  then  given  by 

!&1W4al)),G-,(<W<xJ))]  (22) 

where  ^al-zg^zo+zW)/  (1-a(z<j+z<a)))  and 

We  note  that  when  a«0,  (2.3)  reduces  to  Efron's  BC  (Bias-corrected)  percentile  interval,  and  if  also  z0«0, 
then  (2.3)  is  simply  (S'^aJ.G'^l-a)],  the  percentile  interval. 

A 

For  the  remainder  of  this  section,  we  w8l  be  discussing  the  parametric  BCa  interval,  that  is,  with  Fn-F§. 
Sections  3  and  4  will  discuss  the  multiparameter  parametric  BCa  and  the  non-parametric  BCa  respectively. 


Where  does  the  complicated  looking  formula  (2.3)  come  from?  Recall  that  standard  confidence 


intervals  (1.1)  are  based  on  the  assumption 

-  N(0,1)  (2.4) 

A 

O 

The  BCa  interval  is  based  on  a  more  general  assumption: 

9(0)-g(8)  -  N(  -ZoJI  +  agfO))2)  (25) 

where  g(.)  is  a  monotone  transformation.  In  (2.4)  it  is  assumed  that  on  the  given  scale,  the 
standardized  statistic  is  normal  with  constant  variance.  In  (2.5),  we  only  assume  that  on  some  transformed 
scale,  the  standardized  statistic  is  normal,  possibly  with  some  bias  and  possibly  with  a  standard  deviation 
changing  linearly  with  the  parameter.  Efron  proves  two  facts  about  the  BCa  interval: 

1)  If  (2.5)  holds  for  some  g(.),  then  the  BCa  interval  is  oorrect. 

2)  For  any  one  parameter  problem,  the  BCa  interval  is  second  order  correct.  This  means  roughly  that  any 
one  parameter  problem  can  be  approximately  put  in  form  (2.5). 

Here's  in  more  detail  what's  meant  by  1)  and  2).  One  can  show  that  if  (25)  holds  then  the  problem  can  be 
further  transformed  into  a  translation  problem.  The  transformation  used  is  h(t)«(1/  a)log(1+at).  The 
transformed  problem  is 


where 


;-;+w 

;  -  (1/  a)  bg(1  +  ag(9)) 
;  -  (1/a)bg<1  +ag(0)) 

W -  (1/a)  bg(1  +  a(Z-Zo)) 


(2.6) 


-Vi  L 
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Z  being  a  N(0,1)  random  variable.  On  the  (j  scale  an  'exact'  interval  can  be  constructed  by  inverting  the 

A 

pivotal  5  -  C  Transforming  back  to  the  g(.)  scale  then  gives  the  BCa  interval.  This  is  the  meaning  of  1).  Fact 
2)  refers  to  a  comparison  of  the  8Ca  interval  with  the  exact  interval  for  any  one  parameter  problem.  If  we  are 
in  a  one-parameter  problem,  then  the  statistic  0  has  a  distribution  depending  only  on  9,  say  f9.  Now 

A 

suppose  that  the  100(1-a)th  percentile  of  0  as  a  function  of  0,  say  0(a),  is  a  continuously  increasing 
function  of  0  for  any  fixed  a.  Then  the  usual  exact  confidence  interval  (constructed  by  inverting  the  size 
a  most  powerful  test  at  each  0)  is  (OexM-GaxI1*0!)  where  0ox[ct)  is  the  value  of  0  satisfying  0(a)«0.  Then 
Efron  shows 


%CaW*®UtN 

_ _ _  -  Oy(1/n)  (2.7) 

a 

where  0BCa[al  ®  the  endpoint  of  the  BC,  interval.  By  comparison,  the  endpoints  of  the  standard  interval 
(1.1)  differ  from  the  exact  ones  by  Op(r>-1/2). 

What  makes  the  BCa  interval  attractive  is  that  one  doesn't  need  to  know  the  transformation  g(.)  to 
construct  the  interval!  Looking  back  at  (2.3),  we  see  that  3  things  are  needed:  the  bootstrap  distribution  of 

A.  A 

0  (G),  the  bias  constant  zg  and  the  acceleration  constant  a.  As  mentioned  earlier .  the  bias  term  z0  is 
estimated  by  <t>'1(P(0*  <  §)).  Note  that  P(g(0’)  <  g(0))-  P(0*  <  0)  for  any  monotone  g(.)  so  bias  is 
transformation  invariant  It  turns  out  that  Zg  is  typically  Op(n-1/2). 

We  have  still  to  discuss  the  acceleration  constant  a.  From  (2.5)  we  see  that  a  measures  how  fast  the 
standard  deviation  of  q(q)  is  changing  with  respect  to  g(0).  Lke  zg,  a  is  typically  Op(n',/2).  Efron  shows 

that  a  can  be  estimated  by 

SKEW*§(\$) 

a  -  -  (Z8) 

6 


fi 


Here  le(9)»d/  d0  (log  f9)  evaluated  at  9=9  and  SKEW9_§(Z)  represents  the  skewness  of  the  random 
variable  Z  under  the  distribution  governed  by  9=9.  As  is  the  case  with  the  other  two  components, 
computation  of  (2.8)  doesn’t  require  knowledge  of  g(.}.  It  can  be  computed  analytically  for  some  simple 
cases  and  requires  parametric  bootstrap  calculations  in  general.  Note  also  that  because  the  likelihood  is 
invariant  under  monotone  reparametrizations  so  is  the  right  hand  side  of  (2.8). 

2.3  Example  1. 

Table  1  illustrates  the  exact,  standard  and  bootstrap  confidence  intervals  for  a  familiar  problem.  The 
dataxj,  x2...jcn  are  i.i.d  N(0,1).  The  parameter  of  interest  is  9»Var(Xj).  Level  1  -2a  confidence  intervals  are 
to  be  based  on  the  unbiassed  estimate  9  -  Z(Xj-x)2/  (n-1).  The  sample  size  n  was  taken  to  be  20  and 
a-.05.  The  exact  interval  is  based  on  inverting  the  pivotal  0/0  around  its  chi-squared  (n-1)  distribution. 
The  standard  interval  (line  2)  is  of  the  form  (1 .1)  with  a  -  0  (2/n)1/2  the  estimated  asymptotic  standard  error 
of  0.  The  BCa  interval  (line  5)  is  based  on  formula  (2.5).  The  BC  interval  (line  4)  is  based  on  (2.5)  with  a 
equal  to  0  and  the  percentile  interval  (line  3)  has  a  and  z0  equal  to  0.  The  bootstrapping  was  performed 
parametrically,  that  is.  resampling  was  done  from  N(0,9).  The  remaining  lines  are  discussed  in  section  4  . 
The  lower  and  upper  values  in  Table  1  refer  to  averages  over  300  monte  carlo  simulations  of  the  intervals. 
The  level  column  indicates  the  proportion  of  trials  in  which  each  interval  didn't  contain  the  true  value  9«1 . 


Table  1 

Confidence  intervals  for  the  variance 


Average 

Average 

Level  (%) 

Lower 

Upper 

(1)  Exact 

.630 

1.878 

10.0 

_ (2)  Standard 

.466 

1.531 

11.0 

1 

(3)  Percentile 

.520 

1.585 

10.7 

1 

(4)  BC 

.578 

1.670 

10.7 

Parametric  j 

(5)  8Ca 

.628 

1.860 

9.7 

(6)  BCa® 

.629 

1.877 

10.0 

1 

(7)  Percentile 

.484 

1.363 

24.3 

Non  | 

(8)  BC 

.592 

1.467 

19.3 

Parametric  | 

(9)  BCa 

.617 

1.524 

19.3 

1 _ 

(10)  BCa° 

.633 

1.540 

18.7 
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Of  the  intervals  (1)-  (5),  only  the  BCa  interval  captures  the  assymetry  of  the  exact  interval.  The  standard 
interval  (2)  undercovers  on  the  right  but  overcovers  on  the  left  so  the  overall  level  is  about  right.  This 
illustrates  why  coverage  alone  is  not  a  good  way  to  assess  confidence  intervals.  Elron  (1984b)  also 
considers  this  example  and  shows  that  to  a  high  order  of  approximation  one  can  transform  the  problem 
into  form  (2.5)  with  Zq-.1082  and  a«(1/6}(8/19)1/2  -  .1081.  Hence  it  is  not  surprising  that  the  percentile 

and  BC  intervals  perform  poorly  because  the  bias  and  acceleration  components  are  non-negligible. 

Remarks. 

A  A  • 

a)  Efron  begins  by  assuming  that  only  9  has  been  observed  ,  having  density  f0.  Bootstrap  values  9  are 
generated  from  fg.  We  have  assumed  that  a  data  vector  y  has  been  observed  but  confidence  intervals  will 

be  based  on  fy  on  the  m.l.e.  9 .  The  two  notions  are  equivalent  and  it  is  easy  to  see  that  the  distribution  of 
9  *  for  y  Fg  is  fg.  By  starting  with  the  data  vector  y  ,  the  one-parameter,  multi-parameter  and  non- 

parametric  problems  can  all  be  presented  in  a  unified  fashion. 

b) .  Let  ly(9 )  be  the  log  likelihood  for  9  based  on  y.  Then  as  Efron  notes  (  Remark  F),  lY(9 )  could  be 
used  in  place  of  1^(9 )  in  the  formula  for  a,  for  their  skewnesses  differ  by  only  Op(  1/n).  The  formula  based 

on  ly(9  )  will  sometimes  be  easier  to  compute  in  the  one-parameter  case  and  is  used  in  the  multi¬ 
parameter  and  non-pa rametric  problems  in  Sections  3  and  4. 

2.4  A  different  view  of  the  BC,  interval:  the  BC,°  Interval. 

It  seems  that  the  computation  of  the  bootstrap  distribution  G  alleviates  the  need  to  know  g(.),  yet  the 
second  order  correctness  of  the  BC,  interval  suggests  that  a  g(.)  always  exists  approximately  satisfying 

(2.5).  Indeed  this  is  the  case  as  we  will  show  in  this  section. 


Lei  lv(9)  be  the  log  likelihood  lor  0  based  on  y.  Let  ic2(0)-E(d2lv(0)  /  d02)  be  the  expected  Fisher 
information  for  0  and  let  c*  (k2(8)]'1/2.  Then  the  variance  stabilizing  transformation  for  9  is  g,(0)  where 


9i(t)  *  J'  [x^u)],/2cli  (2.S) 

Lei  gA (s)»(eAs-1)/  A,  a  skewness  reducing  transformation  for  strategically  chosen  A.  And  finally  let 
g(t)=gA(9i(0)-  Then  the  following  theorem  asserts  that  this  g(.)  puts  any  one  parameter  problem  into 
approximately  form  (2.5). 


Theorem  2.1 

If  0-f0,  and  g(t)  is  as  defined  above,  then  with  regularity  conditions  on  the  derivatives  of  the  log- 
likelihood, 

E(g(0)-g(0))  -  -zq+ 0(rrT) 


and 


Var(g(0)-g(0))  ■  (1 +Ag(8))  +  0(rr1) 


Furthermore,  if  A-  SKEWg.gflgfO))/  6,  then 

SKEW(g(ehg(0))  -  Qrr1) 


What  use  is  theorem  2.1  ?  For  one,  it  enables  us  to  construct  a  confidence  interval  on  the  original  0  scale. 

A  A 

For  simplicity,  choose  c  in  (2.9)  so  that  g1(0)— 0  and  hence  g(0)-O.  If  (2.5)  holds,  then  Elron  shows  that  the 
endpoints  of  the  correct  interval  on  the  g-scale  are 


A  «  <Z0+^ 

(1-a  (zn+zW)) 


(2.10) 


which  equals  (z0+z(a))/  (1-a(z0+z(a>)  since  g(9)«0.  The  corresponding  endpoints  on  the  9  scale  are  thus 


(zp+z^ 

sr1  (  -  J  (2.H) 

1-a(z0+z(a) )) 


We  will  call  this  interval  the  BCa°  interval  and  denote  its  endpoints  bySgca0  N-  Given  theorem  2.1 ,  it  is 
not  surprising  that  the  endpoints  of  BCa°  and  BCa  agree  up  to  Opfn*1). 


Theorem  2.2 


®BCaPM*®8CaM 
- - -  “  Op(nri) 

a 

Together  with  Efron's  result  (5.4),  it  also  establishes  the  second  order  correctness  of  the  BCa°  interval. 

Note  that  the  BCa°  interval.  like  the  BCa  interval,  maps  in  the  obvious  way  under  reparametrization 
because  the  variance  stabilizing  transformation  also  maps  correctly. 


2.5  Example  1  continued. 

Line  6  in  Table  1  shows  the  results  of  the  BC  ,°  interval  applied  to  the  variance  problem.  The  overall 
results  are  very  similar  to  the  BCa  numbers  and  on  an  individual  basis  the  BCa°  and  the  BCa  intervals  were 
very  close.  We  used  the  values  Zg-,1082  and  a-(1/6){8/19)1/2».1081  computed  analytically  by  Efron. 
The  transformation  g^s)  works  out  to  [(n-iy  2J1/2log(s)  and  hence  g(s)-ga(g^(t))-k1tc>k2  where  c-  [(n-iy 
2]1/2a  -  1/3.  Thus  the  procedure  has  reproduced  the  Wilson-Hifferty  cube  root  transformation.  Efron 
(1 984b.  Remark  E)  makes  a  similar  calculation. 
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2.6.  Example  2.  The  correlation  coefficient. 


As  a  second  example  we  consider  the  correlation  coefficient  problem  discussed  in  Efron  and  Hinkley 
(1977).  The  data  (Xj.y-,)  are  i.i.d  bivariate  normal  with  means  0,  variance  1  and  correlation  8.  We  will  base 

A 

central  90%  confidence  intervals  for  8  on  the  m.I.e  8.  Note  that  the  sample  correlation 
P“Ixjyj  /  (Xxj2Iyj2)1/2  is  not  the  m.I.e.  Standard  calculations  show  -a— (1/  3)(Q(3+Q?))/  [n1/2(l  +e2)3/  2]. 

We  will  consider  the  case  n-15,  0«.9  for  which  a«*.121 19.  Table  2  shows  the  results  of  300  monte  carlo 
runs  for  a  number  of  intervals. 


Table  2 

Results  for  correlation  coefficient  example. 


Average  Average  Level  (%) 
Lower  Upper 


Standard 

.816 

.954 

7.0 

(based  on  p) 

Standard 

.757 

.958 

73 

(based  on  tanfr1^)) 

Percentile 

.761 

.930 

18.0 

BC 

.742 

.922 

23.3 

BCa 

.701 

.914 

29.3 

BCa° 

.763 

.931 

14.0 

The  first  two  intervals  are  based  on  the  sample  correlation  coefficient  (using  the  observed  Fisher 
information  for  the  variance).  The  second  interval  was  obtained  by  transforming  by  tanh*1,  computing  the 
interval,  then  transforming  back.  The  bootstrap  intervals  are  all  based  on  8  and  parametric  bootstrap 
sampling.  The  variance  stabilizing  transformation  turns  out  to  be 


g,(8)  .  n1/2{tanh*1[21/20/(1+02)L/2l  -  tanh*'1!  QT(1  -t-02)172  J) 


(2.12) 


The  results  are  surprising.  The  BC  and  BCa  intervals  seem  to  pull  percentile  interval  in  the  wrong  direction 
and  hence  the  coverage  gets  worse.  The  BCa°  interval  performs  quite  well  and  seems  to  agree  with  the 
interval  based  on  the  tanh'1  transformation. 


2.7  More  on  the  transformations. 

* 

Recall  the  discussion  of  the  BCa  interval  in  section  A  monotone  transformation  g(.)  that  mapped  the 
problem  into  the  form  g(0)-g(8)  -  N(-2o,(1+ag(0))2)  was  assumed  to  exist.  Let  $«g(0)  and  <t>=g(9).  Once  the 
problem  was  mapped  to  the  <$>  scale,  the  transformation  (1/  a)  log(1+at)  was  used  to  further  map  the 
problem  into  a  translation  family  and  thereby  obtain  an  exact  confidence  interval.  The  two  transformations 
were  then  inverted  to  produce  the  desired  interval  on  the  8  scale.  This  is  summarized  in  Figure  1. 

Figure  1. 

Transformations  Implicitly  used  by 
the  BCa  interval 

♦  -  9(0)  C»(1/a)log(1+ag(8)) 

\A  .  x_ 

8-te  (U)  -  Nl-z^l+a*))  ;-Wl/a)log(i4a(Z-zo)) 

<-parametrization-> 

The  BCa  procedure  automatically  achieves  this  working  only  on  the  0  scale  with  no  knowledge  of  g(.).  The 
BCa°  interval,  on  the  other  hand,  gives  an  explicit  construction  for  g(.),  namely  gW-g^g^t))  where 
g-,(t)-J  *  [K2(u)]1/2du  and  ga(t)-(eat-1)/  a.  Notice  that  the  transformation  (e^iy  a  is  just  the  inverse  of  the 

transformation  (1/  a)k>g{1+at).  Hence  we  have  a  simpler  description  of  the  intervals:  the  transformation 
g-((t)  is  used  to  map  the  problem  into  the  translation  form  a)togCUa(Z-zo)).  The  BCa°  procedure 

computes  g^(t)  explicitly  while  the  the  BCa  procedure  avoids  computation  of  g^t)  through  use  of  the 

A 

bootstrap  distribution  G. 
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3.  Confidence  intervals  in  multiparameter  problems. 


In  section  2  we  concentrated  on  one-parameter  problems  although  early  on  we  discussed  the 
multiparameter  parametric  bootstrap.  Here  we  will  briefly  describe  the  extension  of  the  BCa  and  BCa° 

intervals  to  multiparameter  problems.  The  main  purpose  of  the  discussion  will  be  to  provide  a  framework 

for  the  non-parametric  problem  addressed  in  the  next  section. 

Suppose  that  our  unknown  probability  mechanism  is  where  i\  is  a  k  dimensional  parameter. 

Denote  the  (real-valued)  parameter  of  interest  by  0-t(ri).  In  order  to  apply  the  confidence  interval 

procedures  of  section  2,  we  must  first  reduce  the  problem  to  a  one-parameter  problem.  We  will  follow 

Efron  and  utilize  Stein's  least  favourable  family  for  this  purpose. 

Denote  the  density  of  by  f^and  let  the  m.Le  of  ti  be  t\.  Let  1^  be  the  k  by  k  matrix  with  ijth  entry 

*{dz  /  dri  jdq  j)  log  evaluated  at  Ti«q.  Let  V  be  the  gradient  vector  of  0— t(-q)  evaluated  at  ti, 

Vp.(d/  dr(j)  t(n)J  The  least  favourable  direction  through  i\  is  defined  to  be 


P-  (ty*1  V  (3.1) 

A  A  A 

The  least  favourable  family  F  is  the  one-dimensional  subfamily  of  F^  passing  through  ij  in  the  direction  p: 

P  -  P-2) 

Note  that  r\  and  p  are  fixed,  and  X  is  the  parameter  of  the  family.  Why  is  this  family  called  least  favourable? 

Roughly  speaking,  this  family  points  in  the  direction  that  0  is  changing  fastest  in  the  information  metric 
(l^)*1 .  More  formally,  consider  estimation  of  0(X)-t(ri+Xp)  in  the  family  <fj+x£.  One  can  show  that  observed 

Fisher  information  for  0(X)  in  this  problem  is  the  same  as  that  for  0-t(f|)  in  the  original  k  dimensional 
problem.  Furthermore,  any  other  subfamily  has  a  greater  Fisher  information  for  0.  In  this  asymptotic  sense 
the  reduction  of  the  full  family  fo  the  least  favourable  family  is  the  only  reduction  in  which  estimation  of  0  is 
not  made  artificially  easier.  Figure  2  illustrates  the  least  favourable  family. 


*  - 
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Figure  2. 

Stein's  least  favourable  family 
tj«  mle,  0-  t(n),  Ca  ~{r\  I  t(n)»0}. 
the  level  surface  of  constant  0 


Tibshirani  and  Wasserman  (1985)  and  Didccio  and  Tibshirani  (1985)  show  that  the  least  favourable 
family  passes  through  rj  in  the  same  direction  as  the  profile  likelihood  and  also  that  the  two  families  differ 
by  only  O0(1/n). 

Given  this  reduction  we  can  now  apply  the  BCa  method,  acting  as  if  our  problem  is  the  one  parameter 
problem  f^+)£.  The  algorithm  of  section  2.2  can  be  used  with  resampling  performed  parametrically  from 
the  mJ.e  F{j  (corresponding  to  the  one  dimensional  m.l.e  X-0).  The  bias  constant  zq  is  estimated  by 
<t»*1  (G(G))  as  before.  The  acceleration  constant  a  will  be  different  than  before,  however;  it  will  involve  the 
skewness  of  the  log-likelihood  in  the  least  favourable  family: 


SKEWjujfd/dMPogl^ 
a  -  _ 

6 


(3  3) 


14 


Except  for  some  simple  cases,  estimation  of  a  will  require  bootstrap  computations.  Fortunately,  an  explicit 
formula  for  a  will  be  available  in  the  non-paramethc  case  (next  section). 

The  BCa°  method  can  also  be  used  in  this  setting.  Its  definition  is  much  the  same  as  before.  Here  we 
use|gi(t)  -<J,lK2X(u)l1/2du.  where  k2\u)  is  the  expected  Fisher  information  for^in  the  family  f*+X£.and 

ga(tHea,-1  V  a  as  before.  Using  formula  (3.3)  for  a  and  z0  -O1  (G(9))  we  obtain  an  interval  (X^.  XJ  for  X. 
Finally  this  gives  an  interval  for  6  through  the  relationship  0(X)«t(Tt+X&).  Note  that  g^t)  will  be  difficult  to 

calculate  in  general  but  like  a,  it  is  easily  computed  in  the  non-paramethc  case. 

We  have  constructed  the  BCa  and  BCa°  intervals  for  multiparameter  problems  by  extending  the  one- 

parameter  definition  to  the  least  favourable  family.  To  justify  their  use  we  need  to  show  that  in  some  sense 
they  are  second  order  correct.  It  turns  out  that  a  'correct*  interval  is  difficult  to  define;  instead,  we  can 
resort  to  the  weaker  requirement  that  each  of  the  intervals  err  in  their  coverage  only  by  Op(l/  n).  Formally, 

Probil(0l9OiM<O<0BCal1-0®  -1-2«x+0^1/n)  (3.4) 

and  similarly  for  Oeca0!®]- We  conjecture  this  result  and  also 

0  bc *0  (aj  -  0  bc*  M 

- - -  -<Vn-i)  (35) 

a 

but  so  far  we  have  been  unable  to  proof  these  conjectures. 


4.  Non-parametric  problems. 


If  we  were  to  approach  the  non-parametric  problem  in  its  most  general  form  we  would  have  to  consider 
all  possible  distributions  F^,  that  is,  let  q  be  infinite  dimensional.  This  would  obviously  be  infeasible. 

Following  Efron,  we  simplify  the  problem  substantially  by  assuming  that  F^  has  support  only  on  the 

observed  data  x1  ,x2,..jtn.  This  makes  the  problem  finite  dimensional  and  the  approach  of  section  3  can  be 

used. 

Consider  the  data  to  b®  fixed  let  { -  log(Prob(X«Xj)),  i-1,2,...n.  We  can  describe  any 

realization  from  F^  by  P*  where  Pj* «  xj/  n.  Then  F^  is  a  rescaled  multinomial  distribution,  that  is  P* 
-  Mult(n,efy  n.  The  observed  sample  gives  rise  to  q-  log(P°)  where  P°«(1/  n,1/  n,...1/  n)1  and  hence  F~ 
•  Mult(n,P°)/  n.  The  least  favourable  family  through  q  turns  out  to  be  P*~  Mult(n,w  *■)/  n,  where 
wi*-e*-^  /£e*4  and 


«(1-c)P'+e$)-t(Fn) 

U|  -  irrwo  _  (4T) 

e 

(See  Efron  1984b,  section  7).  Here  S|  is  a  point  mass  at  X|  and  the  Uj  are  called  the  empirical  influence 

A 

components  of  e-KF"). 

We  now  have  almost  al  we  need  to  compute  the  BC,  interval  for  the  non-parametric  case.  Resampling 
is  done  from  Ffj«  Mult (n,P°y  n  and  this  is  equivalent  to  sampling  with  replacement  from  x1,x2,...xn.  The 

bias  constant  is  estimated  as  as  before.  We  require  only  an  estimate  of  the  acceleration  a. 

Applying  formula  (3.3)  to  the  multinomial  family  gives 

IUj3 

a  -  _  (42) 

au^3® 
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Table  1  line  9  shows  the  results  of  the  non-parametric  BCa  interval  applied  to  the  variance  problem  it 


outperforms  the  (non-parametric)  percentile  and  bias-corrected  percentile  intervals  but  doesn't  fully 

A  * 

capture  the  assymetry  of  the  exact  interval.  This  is  due  to  the  short  tails  of  the  bootstrap  distribution  of  9 
The  BCa°  interval  can  also  be  used  here.  The  transformation  Oi  (t)-cJ  *  (K2Ms)l1/2ds  requires  an 
estimate  of  the  expected  Fisher  information  k2*-(s)  for  the  multinomial  subfamily  (4.1).  Straightforward 
calculations  show  that 


ic2*(s)  ”  n[2j\2e^s/Zfi^is  "  (S^e^/SleM*)2]  U.3) 

A  simple  numerical  integration  (like  the  trapezoid  rule)  can  then  be  used  to  compute  g-j  (t).  Note  that  k2^s) 
is  a  non-negative  function  by  Jensen's  inequality  and  is  in  fact  positive  unless  all  the  Uj's  are  equal.  Hence 
g^t)  will  be  monotone  increasing  and  invertible. 

Line  10  of  Table  1  shows  the  results  of  the  BC,°  procedure  applied  to  the  variance  problem.  As  in 
the  parametric  case  the  results  were  very  similar  on  an  interval  to  interval  basis  to  the  BCa  results. 

Actually,  computation  of  the  BCa°  intervals  doesn't  even  require  bootstrap  sampling!  The  only 
component  of  the  procedure  that  seems  to  require  it  is  the  estimation  of  z0.  But  Efron  (1984b  section  7) 
provides  an  approximation  for  zq  based  on  first  and  second  order  empirical  influences.  Let  V  be  the  n  by  n 
matrix  of  second  order  influences,  define  Zqi»(1/  6)ZUj3/  [ZUj2]3'  2  (the  approximation  for  a)  and  let 
z^-JUtyU/  ||U||2  -  trace(V)  ]  /  2n||UU2.  Then  a  good  approximation  for  zq  is 

Zq-^P^Zo^Zo,))  (4-^ 

/ 

Using  the  following  method  due  to  Tim  Hesterberg  of  Stanford  .  Zo2  can  be  computed  with  only  2 
additional  evaluations  of  the  statistic.Let  U(i,e)  equal  the  expression  in  the  right  hand  side  of  (4.1)  for  some 
small  positive  t .  Let  D(i,e)  -U(i.e)  -  U(e)  where  U(e)  is  the  mean  of  the  U(i.e)  's.  It  is  easy  to  show  that 
trace(V)-e2Z  U(i.e).  Using  the  notation  0  (P‘)  to  denote  9=t(F)  evaluated  for  the  distribution  F  putting 
mass  Pi*  on  Xj  (see  e.g.  Efron  1981).  one  can  also  show  that  U*VU  -  [9(  P°  +eU)-  0(  P°  -eU)-20(  P°)J  /e2. 


Thus  a  total  of  n+2  evaluations  of  the  statistic  are  required  to  compute  a  and  zq.  Note  however  that  (4.4)  is 
only  an  approximation;  Hesterterg  is  presently  studying  its  accuracy. 

It  the  BCa  and  BCa°  intervals  can  be  shown  to  be  second  order  correct,  then  they  will  also  be  second 
order  correct  in  the  non-parametric  setting,  if  it  is  assumed  that  the  number  of  categories  in  the  support  of 
the  multinomial  stays  fixed  as  n  goes  to  infinity.  Combined  with  the  assumption  that  the  support  of  the 
distribution  is  confined  to  x1(  X2  , ...  xn,  this  is  a  less  than  ideal  definition  on  "non-parametric  second  order 

correctness".  We  are  currently  looking  at  ways  of  making  it  more  realistic. 

Example  3.  The  Proportional  Hazards  model. 

For  illustration  we  applied  these  methods  to  the  proportional  hazards  model  of  Cox  (1972).  The  data 
we  chose  was  mouse  leukemia  data  analysed  by  Cox  in  that  paper.  It  consists  of  the  survival  times  (yj)  in 
weeks  of  mice  in  two  groups  (Xj) ,  control  (0)  and  treatment  (1),  as  well  as  a  censoring  indicator  (5j).  The 

A 

partial  likelihood  estimator  p  was  1.51.  We  applied  the  confidence  interval  procedures  by  considering 
(yj.  Xj  ,5j)  as  the  sampling  unit.  Estimation  of  the  BCa°  interval  requires  writing  the  statistic  as  a  functional 

statistic—  not  necessary  for  the  BC  interval  because  it  ooly  evaluates  the  statistic  on  bootstrap  samples. 
We  define  *  the  partial  likelihood  estimator  for  sample  weights  w  ,p(w),  as  the  maximizer  of 


PLW  -  nev£Pk^/Q^w,e)p(xjp)3>V)  (4.5) 

1*0 

where  0  is  the  set  indices  of  the  failure  times,  Rj  is  the  set  of  indices  of  the  Hems  at  risk  before  the  ith  failure 

and  each  of  the  sums  is  over  the  Hems  failing  at  the  ith  failure  time.  This  definition  is  found  in  Tibshirani 
(1984).  Finally,  U  and  V  were  computed  by  substituting  e-1/  (n+1)  into  their  definitions  .  Table  3  shows 
the  results  of  the  various  non-parametric  confidence  procedures. 


Table  3 

Confidence  intervals  for 
Proportional  hazards  example 

Standard  (34, 2.18) 
Percentile  (33,234) 

0C  (35,236) 

BCa  (.75,2.15) 

BCa°  (37,2.03) 


Interestingly,  the  percentile  and  BC  intervals  shifted  the  standard  interval  to  the  right,  but  the  negative 
acceleration  (a— .152)  caused  the  BCa  and  BCa°  intervals  to  shIHback  to  the  left.  The  BCa°  is  also 
somewhat  shorter  than  the  BCa  interval. 
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5.  Approximating  the  bootstrap  distribution  of  a  statistic.  j 

i 


The  results  of  sections  2  and  3  show  (and  conjecture)  respectively,  that 


G-’ttoD  (zfaHty-  (z0«(c^A1-a(2o+2(a>)) 


and 


(5.1) 


differ  by  only  O^n*1).  We  can  use  this  to  estimate  G"'(P)  (for  any  p).  without  bootstrap  sampling,  as  follows. 
First  we  find  z<a)  such  that  p-zfa],  i.e. -pf  (1  +ap)  *Zq-  Then  we  substitute  this  into  (5.1)  and  thus  get  an 
approximation  to  6»  • 


If  instead  we  want  a  density  that  closely  approximates  the  bootstrap  histogram,  we  recall  that 

A 

g(9)-g(0)+a(Z-Zo)  where  Z  is  a  N(0,1)  random  variable.  Hence  a  good  approximating  density  is  the  density 
of  g'^gW+afZ-Zo)).  After  a  little  algebra  this  can  be  expressed  as 


Xs)  -  Y[(e0i<s*-1)/a  -tfcJePiW®  (k2(s))1'2 


(52) 


where  y  is  the  density  function  of  N(0,1).  In  the  non-parametric  case,  (5.2)  gives  the  density  of  X  and 
must  be  multiplied  by  dX/  d0  -  n/  k2x  (s )  to  obtain  the  density  for'©* 
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For  the  Cox  model  example.  Figure  3  shows  a  histogram  of  1000  bootstrap  values  along  with  the 
approximating  density  j(s)  (renormalized)  and  Table  4  shows  the  approximation  based  on  ( S2).  In  both 


cases  the  agreement  is  quite  good. 


Figure  3 

Bootstrap  histogram  and 
approximation  based  on  (7.2) 


This  approximating  procedure  can  be  thought  of  as  a  refinement  of  the  usual  central  limit  theorem 
approximation  N(0,  k2(0)'1),  conect  to  order  n'1/2.  The  new  approximation 

g"1  (N{g(Vzo/1+ag(Vl  (5-4) 

incorporates  three  order  n'1/2  components:  g(.),  z0  and  a.  In  a  parametric  setting,  (5.2)  could  prove  to  be 
a  useful  alternative  to  an  edgeworth  expansion.  It  has  two  distinct  advantages  over  edgeworth 
expansions:  1)  it  is  always  non-negative  because  g(.)  is  monotone  increasing  and  2)  it  is  computable 

A 

(albeit  not  often  by  hand)  for  general  first  order  efficient  statistics  0. 

The  reason  that  this  procedure  works  in  the  non-parametric  setting  is  that  asymptotically,  one  has  only 

A  A 

to  look  at  the  bootstrap  distribution  of  0*  projected  onto  U  in  order  to  compute  G(.).  It  is  easy  to  check 
that  a  ( formula  4.2)  equals  the  skewness  of  P**  U  and  that  Zq  takes  into  account  both  this  skewness  and 
the  curvature  of  the  level  surfaces  near  P°. 


6.  Proofs  of  theorems  2.1  and  2.2. 

Suppose  that  the  parameter  8  has  been  rescaled  to  be  of  order  n1/2  as  in  Efron's  (1984b) 
(4.5).  Assume  also  the  regularity  conditions  in  Efron's  (4.4).  Consider  now 

♦-fl»-«e*-i)/A 

where  A  is  understood  to  be  a  constant  of  order  n‘1/2.  Then 

(e,fle/A)(eA£0>-1) 

and  from  the  moments  of  0-0  (see  for  example  Welch  1965)  it  can  be  shown  that 

E(*-$)  -  (1/2)ne^ ((2k, !  +*00, V n1  ^A/ k2  +0(n-2)] 
varfo-#  -  ne^l/xg-fCXrr2)) 

*(♦-♦)  -  K^Z+sAn^/x^^n-i) 

■&(♦-<*  -  C(rrl) 

where  ^  and  tz  skewness  and  excess  in  kurtosis  and  the  k's  are  as  defined  in  DiCiocio  (1984). 
choice 


expression 

(6-1) 

(62) 


(63] 

If  the 


A  -  -(I/^Pk,^^)/^ 


(6-4 


is  made,  then  y,($  -  $)  is  0D(n‘1).  By  the  relations  attributed  to  Bartlett,  *3+3icn +*30,-0  and  *3-2*30, +*2. 
it  follows  that  if  9  is  the  variance  stablized  parameter  with  *2»1 .  then 

A  -(I/6XK3  /  Kg3'2)  -  (1/ 6)(k3  /n?2)  (6.5) 


and 


E(<>-<t>)  -  -z&  CXm1) 

var($  -  $) .  e^+Otnr1) 

YiW-'W  .  Ofrr1)  (6.6) 

Thus  ^  -  <t»  is,  to  second  order,  normally  distributed  with  mean  -z0  and  standard  deviation  eA0-1  +A<t». 
Although  *3  at  the  true  value  0O  is  unknown,  k3(0)  may  be  used  in  its  place  for  the  calculation  of  A.  without 

altering  the  orders  of  the  preceding  error  terms.  This  establishes  theorem  (2.1).  Theorem  (2.2)  then 
folbws  immediately  from  Efron's  (11.3).  In  fact  (11.3)  holds  exactly  for0BCa°(a]- 
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20.  ABSTRACT 

^  '  .  1.  ,  -  '  '  _ 

-We  study  the  "BCa“  bootstrap  procedure  (Efron  1984)  for  constructing 

parametric  and  non-par ame trie  confidence  intervals.  The  BC  interval  relies 

&  q _ -• 

on  the  existence  of  a  transformation  that  maps  the  problem  into  a  "normal 

'  /*  ; 

scaled  transformation  family".  We  show  how  to  construct  this  tranformation  in 

,  •' ,  r  *-  • 

general.  Exploiting  this,  we  derive  an  interval  that  equals  the  BC^  interval 
to  second  order,  computable  without  bootstrap  sampling.  As  a  further  benefit, 
this  construction  provides  a  second  order  correct  approximation  to  the  bootstrap 
distribution  of  a  statistic,  computed  without  bootstrap  sampling.  Both  the  new 
interval  and  the  approximation  require  only  n+2  evaluations  of  the  statistic, 
where  n  is  the  sample  size. 


